A multi-level context-dependent prosodic model applied to durational modeling
نویسندگان
چکیده
We present in this article a multi-level prosodic model based on the estimation of prosodic parameters on a set of well defined linguistic units. Different linguistic units are used to represent different scales of prosodic variations (local and global forms) and thus to estimate the linguistic factors that can explain the variations of prosodic parameters independently on each level. This model is applied to the modeling of syllablebased durational parameters on two read speech corpora laboratory and acted speech. Compared to a syllable-based baseline model, the proposed approach improves performance in terms of the temporal organization of the predicted durations (correlation score) and reduces model’s complexity, when showing comparable performance in terms of relative prediction error. Index Terms : speech synthesis, prosody, multi-level model, context-dependent model.
منابع مشابه
Modeling the durational difference of stressed vs. unstressed syllables
Speech production exhibits temporal coherence among speech gestures, and also systematic modulation of durational patterns as a function of the hierarchical level of prosodic structure, e.g., the foot. Intergestural coherence has been understood with reference to dynamic coupling within an ensemble of planning oscillators, and a coupled oscillator model of intergestural timing has been employed...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملSpeech rhythm as durational marking of prosodic heads and edges. Evidence from Catalan, English, and Spanish
Data from a total of 24 speakers reading 720 utterances from Catalan, English, and Spanish show that differences in rhythm metrics emerge even when syllable structure and vowel reduction are controlled for in the experimental materials, strongly suggesting that important differences in timing exist in these languages, and thus that the rhythmic percept is not solely dependent on these two phono...
متن کاملIntegration of context-dependent durational knowledge into HMM-based speech recognition
2. DPDF OF STANDARD HMM This paper presents research on integrating context-dependent durational knowledge into HMM-based speech recognition. The first part of the paper presents work on obtaining relations between the parameters of the context-free HMMs and their durational behaviour, in preparation for the context-dependent durational modelling presented in the second part. Duration integrati...
متن کاملDurational Cues and Prosodic Phrasing in French
Studies addressing prosodic constituency in French generally agree on two levels of phrasing (accentual phrase, AP, and intonation phrase, IP), while the existence of an intermediate level of phrasing (intermediate phrase, ip) is still controversial. In this study we examine durational cues in a read speech corpus at normal and fast rates in which the target syllable was either adjacent to a pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009